苏生不惑又写了个小工具
苏生不惑第
371
篇原创文章,将本公众号设为星标
,第一时间看最新文章。
之前分享过我写的一些工具整理下苏生不惑开发过的那些软件和脚本,不过导出的公众号pdf文件太多想合并成一个,于是用PDFShaper合并pdf,但合并后的pdf没有书签:
所有音频也下载了:
topic_url='xxx'
biz=re.search(r'__biz=(.*?)&',topic_url).group(1)
album_id=re.search(r'album_id=(.*?)&',topic_url).group(1)
response = requests.get(topic_url, headers=headers)
voiceids = re.findall('data-voiceid="(.*)"',response.text)
msgids = re.findall('data-msgid="(.*)"',response.text)
links = re.findall('data-link="(.*)"',response.text)
titles = re.findall('data-title="(.*)" data-voiceid',response.text)
print(titles,len(voiceids))
for i,j in zip(titles,voiceids):
voice_url = f'https://res.wx.qq.com/voice/getvoice?mediaid={j}'
# print(i,voice_url)
audio_data = requests.get(voice_url,headers=headers)
print('正在下载音频:'+i+'.mp3')
with open(i+'.mp3','wb') as f:
f.write(audio_data.content)
sys.exit(1)
下载效果:
下载的文章html先转换成pdf:
def to_pdf():
import pdfkit
print('导出 PDF...')
htmls = []
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".html"):
print(name)
try:
pdfkit.from_file(name, 'pdf/'+name.replace('.html', '')+'.pdf')
except Exception as e:
print(e)
def to_word():
print('导出 word...')
htmls = []
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".pdf"):
print(name)
try:
cv = Converter(name)
cv.convert('word/'+name.replace('.pdf', '')+'.docx')
cv.close()
except Exception as e:
print(e)
to_pdf()
# to_word()
然后将转换的pdf合并成一个文件并生成书签。公众号
:
import logging,os,html
from PyPDF2 import PdfFileReader, PdfFileWriter,PdfFileMerger
file_writer = PdfFileWriter()
merger = PdfFileMerger()
num = 0
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".pdf"):
print(name)
file_reader = PdfFileReader(f"{name}")
file_writer.addBookmark(html.unescape(name).replace('.pdf',''), num, parent=None)
for page in range(file_reader.getNumPages()):
num += 1
file_writer.addPage(file_reader.getPage(page))
with open(r"公众号苏生不惑历史文章合集.pdf",'wb') as f:
file_writer.write(f)
def bookmark_export(lines):
bookmark = ''
for line in lines:
if isinstance(line, dict):
bookmark += line['/Title'] + ','+str(line['/Page']+1)+'\n'
else:
bookmark_export(line)
return bookmark
with open('公众号苏生不惑历史文章合集.pdf', 'rb') as f:
lines = PdfFileReader(f).getOutlines()
bookmark = bookmark_export(lines)
with open('公众号苏生不惑历史文章合集.csv', 'a+', encoding='utf-8-sig') as f:
f.write(bookmark)
最近原创文章:
解除b站番剧区域限制,这个特殊版本的 b 站 app 功能太强了
一键批量下载微信公众号文章内容/图片/封面/视频/音频,支持导出html和pdf格式,包含阅读数/点赞数/在看数/留言数
网易云音乐每天自动听歌300首升级LV10,b站每天自动签到升级LV6,京东每天自动签到领京豆,微信运动每天自动修改步数
分享几个音乐神器 APP,免费收听和下载音乐,一键解锁网易云音乐变灰歌曲
如果文章对你有帮助还请
点赞/在看/分享
三连支持下, 感谢各位!